A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams

نویسندگان

  • Mohammad M. Masud
  • Jing Gao
  • Latifur Khan
  • Jiawei Han
  • Bhavani M. Thuraisingham
چکیده

We propose a multi-partition, multi-chunk ensemble classifier based data mining technique to classify concept-drifting data streams. Existing ensemble techniques in classifying concept-drifting data streams follow a single-partition, single-chunk approach, in which a single data chunk is used to train one classifier. In our approach, we train a collection of v classifiers from r consecutive data chunks using v-fold partitioning of the data, and build an ensemble of such classifiers. By introducing this multipartition, multi-chunk ensemble technique, we significantly reduce classification error compared to the single-partition, single-chunk ensemble approaches.Wehave theoretically justified the usefulness of our algorithm, and empirically proved its effectiveness over other state-of-the-art stream classification techniques on synthetic data and real botnet traffic.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Concept-Drifting Data Stream to Detect Peer to Peer Botnet Traffic

We propose a novel stream data classification technique to detect Peer to Peer botnet. Botnet traffic can be considered as stream data having two important properties: infinite length and drifting concept. Thus, stream data classification technique is more appealing to botnet detection than simple classification technique. However, no other botnet detection approaches so far have applied stream...

متن کامل

A Data Intensive Multi-chunk Ensemble Technique to Classify Stream Data Using Map-Reduce Framework

We propose a data intensive and distributed multichunk ensemble classifier based data mining technique to classify data streams. In our approach, we combine r most recent consecutive data chunks with data chunks in the current ensemble and generate a new ensemble using this data for training. By introducing this multi-chunk ensemble technique in a Map-Reduce framework and considering the concep...

متن کامل

A New Framework for Data Streams Classification

Mining data streams has recently become an important and challenging task for a wide range of services, including credit card fraud detection, sensor networks and web applications. In these applications data do not typically take the form of persistent relations, but tend to arrive in multiple, continuous, rapid and timevarying data streams. Hence, conventional knowledge discovery tools cannot ...

متن کامل

Mining Multi-Label Data Streams Using Ensemble-Based Active Learning

Data stream classification has drawn increasing attention from the data mining community in recent years, where a large number of stream classification models were proposed. However, most existing models were merely focused on mining from single-label data streams. Mining from multi-label data streams has not been fully addressed yet. On the other hand, although some recent work touched the mul...

متن کامل

Mining multi-dimensional concept-drifting data streams using Bayesian network classifiers

In recent years, a plethora of approaches have been proposed to deal with the increasingly challenging task of mining concept-drifting data streams. However, most of these approaches can only be applied to uni-dimensional classification problems where each input instance has to be assigned to a single output class variable. The problem of mining multi-dimensional data streams, which includes mu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009